We wanted to explore for possible links between civilian ownership of firearms and total homicides - not just firearm-related homicides.
This has been looked at before, but surprisingly most analysis does not control for income, which is a major confounding factor at the country level.
TODO review of blogs and literature that have looked at the same data.
The indicators2005 data was developed for this purpose. Here’s the first five rows of that dataset:
| region | Pop2005 | country | worldbankincomegroup | Alcohol | Alpha_2 | Alpha_3 | FemaleSuicide | MaleSuicide | Suicide | giniyear | gini | gnpyear | GNPPerCapitaPPP | homicideyear | Homicide | Average total all civilian firearms | FirearmsPer100People2005 | HDI | rich |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Europe | 3833400 | Bosnia and Herzegovina | Upper-middle-income | 9.6 | BA | BIH | NA | NA | NA | 2004 | 34.04 | 2005 | 6680 | 2009 | 1.9 | 675000 | 17.6083894 | 0.697 | FALSE |
| Africa | 4055600 | Central African Republic | Low-income | 1.6 | CF | CAF | NA | NA | NA | 2003 | 43.61 | 2005 | 730 | 2012 | 13.2 | 40000 | 0.9862906 | 0.325 | FALSE |
| Europe | 1032600 | Cyprus | High-income | 9.3 | CY | CYP | NA | NA | NA | 2005 | 30.26 | 2005 | 27060 | 2005 | 1.9 | 275000 | 26.6318032 | 0.830 | FALSE |
| Americas | 9237600 | Dominican Republic | Upper-middle-income | 5.8 | DO | DOM | NA | NA | NA | 2005 | 49.96 | 2005 | 7380 | 2005 | 25.9 | 450000 | 4.8713952 | 0.676 | FALSE |
| Africa | 1377800 | Gabon | Upper-middle-income | 7.9 | GA | GAB | NA | NA | NA | 2005 | 42.18 | 2005 | 13860 | 2012 | 9.4 | 190000 | 13.7901002 | 0.644 | FALSE |
We’re interested in the Homicide column, which was sourced from the World Development Indicators. This variable has a skewed distribution:
ggplot(indicators2005, aes(x = Homicide, fill = region, colour = region)) +
geom_density(alpha = 0.5) +
geom_rug() +
ggtitle("Homicides per 100,000 population") +
labs(caption = "Source: WDI")
Explain the firearms data.
Explain the HDI.
There’s a very faint negative relationship between our main explanatory variable of interest, FirearmsPer100People2005, and homicide rates. This is clearer on the logarithmic scale. The USA is such an outlier for firearms ownership on the raw scale. The (TODO: which right wing outfit?) used this to claim that gun ownership makes a country safer.
p1 <- ggplot(indicators2005, aes(x = FirearmsPer100People2005, y = Homicide, label = Alpha_3)) +
geom_smooth(method = "lm") +
geom_text_repel(colour = "white") +
geom_point() +
labs(x = "Civilian firearms per 100 people",
y = "Homicide per 100,000")
p1
p1 + scale_x_log10() + scale_y_log10()
There’s a much stronger negative relationship between national average income and homicides. Richer countries have lower homicide rates:
ggplot(indicators2005, aes(x = GNPPerCapitaPPP, y = Homicide, label = Alpha_3)) +
geom_smooth(method = "lm") +
geom_text_repel(colour = "white") +
geom_point() +
labs(x = "Gross National Income per person, purchasing power parity measure",
y = "Homicide per 100,000") +
scale_x_log10() +
scale_y_log10()
And there’s a strongly positive relationship between inequality and homicide rates:
ggplot(indicators2005, aes(x = gini, y = Homicide, label = Alpha_3)) +
geom_smooth(method = "lm") +
geom_text_repel(colour = "white") +
geom_point() +
labs(x = "Economic inequality, Gini coefficient (higher number is more unequal)",
y = "Homicide per 100,000") +
scale_y_log10()
Here’s a summary of the overall pair-wise relationships
indicators2005 %>%
mutate(LogGNI = log(GNPPerCapitaPPP),
LogFirearms = log(FirearmsPer100People2005),
LogHomicide = log(Homicide)) %>%
select(LogGNI, HDI, gini, LogFirearms, LogHomicide, Alcohol, rich) %>%
mutate(rich = ifelse(rich, "Most developed", "Less developed")) %>%
filter(!is.na(rich)) %>%
ggpairs(mapping = ggplot2::aes(color = rich))
I’m first going to fit a model with ordinary least squares because it has a high degree of familiarity and may give people confidence in what is to come. Then I’m going to use elastic net regularisation.
First I’m going to restrict mysefl to complete cases.
TODO - instead of doing this, do imputation, and integrate the imputation into the validation and confidence interval bootstraps.
HomicideData <- indicators2005 %>%
dplyr::select(Homicide, gini, FirearmsPer100People2005, HDI, country, Alcohol)
HomicideData <- HomicideData[complete.cases(HomicideData), ]
First the full model fit with OLS. I try two versions - one with an interaction between inequality and the HDI, and one without, and start with a global test. There’s evidence of an interaction:
model_full <- lm(log(Homicide) ~ gini * HDI + log(FirearmsPer100People2005) + Alcohol, data = HomicideData)
model_no_interactions <- lm(log(Homicide) ~ gini + HDI + log(FirearmsPer100People2005) + Alcohol, data = HomicideData)
anova(model_no_interactions, model_full)
## Analysis of Variance Table
##
## Model 1: log(Homicide) ~ gini + HDI + log(FirearmsPer100People2005) +
## Alcohol
## Model 2: log(Homicide) ~ gini * HDI + log(FirearmsPer100People2005) +
## Alcohol
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 131 84.411
## 2 130 77.651 1 6.7591 11.316 0.00101 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here’s the estimates of the various coefficients and their standard errors:
| Dependent variable: | |
| log(Homicide) | |
| gini | -0.071* |
| (0.041) | |
| HDI | -10.111*** |
| (2.512) | |
| log(FirearmsPer100People2005) | 0.153** |
| (0.069) | |
| Alcohol | 0.011 |
| (0.023) | |
| gini:HDI | 0.208*** |
| (0.062) | |
| Constant | 5.325*** |
| (1.657) | |
| Observations | 136 |
| R2 | 0.498 |
| Adjusted R2 | 0.478 |
| Residual Std. Error | 0.773 (df = 130) |
| F Statistic | 25.755*** (df = 5; 130) |
| Note: | p<0.1; p<0.05; p<0.01 |
The p-value for the positive firearms effect on homicides is 0.02 or 0.03 whether the interaction between inequality and income is in or out.
Best way to get a confidence interval for the size of the firearms effect is with a bootstrap.
myfunction <- function(x, i){
the_data <- x[i, ]
bm <- lm(log(Homicide) ~ gini * HDI + log(FirearmsPer100People2005) + Alcohol, data = the_data)
return(coef(bm)["log(FirearmsPer100People2005)"])
}
full_model_boot <- boot(HomicideData, myfunction, R = 5000)
bc <- boot.ci(full_model_boot, type = "bca")
bc
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 5000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = full_model_boot, type = "bca")
##
## Intervals :
## Level BCa
## 95% ( 0.0300, 0.2771 )
## Calculations and Intervals on Original Scale
With log transformation of both the response and explanatory variables, the coefficients are the % change in y for a 1% change in x. So we interpret this confidence interval a 1% increase in firearms held leads to between a 0.03% and 0.27% increase in homicides.
All ok:
par(mfrow = c(2, 2))
plot(model_full)
First we need to decide between the lasso (alpha = 1), ridge regression (alpha = 0), or between
library(glmnet)
X <- model.matrix(log(Homicide) ~ gini * HDI + log(FirearmsPer100People2005) + Alcohol - 1, data = HomicideData)
Y <- log(HomicideData$Homicide)
alphas <- 0:50 / 50
res <- rep(0, length(alphas))
for(i in 1:length(alphas)){
cvmod <- cv.glmnet(x = X, y = Y, alpha = alphas[i])
res[i] <- sqrt(min(cvmod$cvm))
}
plot(alphas, res) # so it doesn't matter much, but perhaps better towards 1 than zero?
Now we need a value for lambda which controls how much shrinkage done of the coefficients. Cross-validation still the best way to do this. We can get the best value for prediction, or the most aggressively shrinking value that gets prediction error within one standard error of the best.
cvmod <- cv.glmnet(x = X, y = Y, alpha = 0.9)
model_enr <- glmnet(x = X, y = Y, alpha = 0.9)
coefs <- round(cbind(
original = coef(model_full),
shrunk = coef(model_enr, s = cvmod$lambda.min),
veryshrunk = coef(model_enr, s = cvmod$lambda.1se)
), 3)
colnames(coefs) <- c("original", "shrunk", "veryshrunk")
coefs
## 6 x 3 sparse Matrix of class "dgCMatrix"
## original shrunk veryshrunk
## (Intercept) 5.325 4.497 -0.199
## gini -0.071 -0.051 0.052
## HDI -10.111 -8.833 -0.580
## log(FirearmsPer100People2005) 0.153 0.150 .
## Alcohol 0.011 0.007 .
## gini:HDI 0.208 0.176 .
So, the best performing model has Firearms as an explanatory variable, and is basically untouched by the elastic net regularization. But a more ruthless model, happy to trade off a bit of predictive power for simplicity, would only include inequality and level of development in predicting homicide levels.
There is evidence that higher civilian ownership of firearms in 2007 is related to higher overall homicide rates. However, the effect is not very large (about 0.16% increase in homicide for 1% increase in firearms owned), and is less important than either income/development (higher level of development leads to less homicides) or inequality (higher inequality leads to more homicides)